Regression Project: Predicting number of Bikes on Rent¶

This is the first project of edwisor Data science career path.

This project is a supervised learning regression problem. Here the goal is to predict the number of bikes that can go on rent using the past two years data.

Several regression machine learning algorithms are tested to predict the number of bikes and mean absolute error, mean squared error, R squared are calulated to check the performance of these algorithms.

First the behavioral pattern of customers is studied accroding to the given data which contains factors like season, year, month, weather condition, temperature, wind speed. Then suitable factors which play a improtant role in the number of bikes are selected and fed to the algorithms.

By this kind of analysis we can ensure the future demand, by which we can prepare our stock as per the predicted numbers, this will increase the company revenue in return.

Intialization¶

Here the required libraries and data is imported.

we have 731 observations which is for 2 years and 15 features which are:-

Date
Season
Year
Month
Holiday
Weekday
Working Day
Weather Situation
Temerature
Feeling Temperature
Humidity
Windspeed
Casual Users
Registered users
Number of Bikes

The target variable is 'cnt' which is the number of bikes.

import warnings
warnings.filterwarnings("ignore")

#Importing required libraries
import numpy as np
import pandas as pd

#Importing libraries for plotting
import matplotlib.pyplot as plt
import seaborn as sns

#Reading dataset
bikeRent = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/edwisor-india-bucket/projects/data/DataN0103/day.csv",
                       index_col=0)

#Get Dimensions of dataset
bikeRent.shape

(731, 15)

#Get first 5 rows 
bikeRent.head()

#Statistical analysis of data
bikeRent.describe()

#Get names of column of dataset
bikeRent.columns

Index(['dteday', 'season', 'yr', 'mnth', 'holiday', 'weekday', 'workingday',
       'weathersit', 'temp', 'atemp', 'hum', 'windspeed', 'casual',
       'registered', 'cnt'],
      dtype='object')

Data Preparation¶

Before doing the analysis we need to first identify the categorical features and replace the numerical values with respective layman values.

Replacing 1, 2, 3, 4 in Seasons feature to
- 1: Spring
- 2: Summer
- 3: Fall
- 4: Winter
0, 1 in Holiday to
- 0: Holiday
- 1: Working Day
0, 1 in Working Day to
- 0: Holiday
- 1: Working Day
1, 2, 3, 4 in Weather Condition to
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
Temperature and Feeling Temperature values are normalized so restoring them to normal values.
Restoring Humidity by multiplying with 100.
Restoring Windspeed by multiplying with 67.

This should be done for better understading of features in analysis.

#Create new dataset for Exploratory Data Analysis
data = bikeRent.copy()

#changing numeric to categorical and changing columns name to actual names
#data['Index'] = data['instant']
data['Date'] = data['dteday'].astype('category')
data['Season'] = data['season'].replace([1,2,3,4],['Spring','Summer','Fall','Winter']).astype('category')
data['Year'] = data['yr'].replace([0,1],['2011','2012']).astype('category')
data['Month'] = data['mnth'].astype('category')
data['Holiday'] = data['holiday'].replace([0,1],['Holiday','Working day']).astype('category')
data['Weekday'] = data['weekday'].astype('category')
data['Working Day'] = data['workingday'].replace([0,1],['Holiday','Working day']).astype('category')
data['Weather Condition'] = data['weathersit'].replace([1,2,3,4],['Clear, Few clouds, Partly cloudy, Partly cloudy',
                                                                  'Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist',
                                                                  'Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds',
                                                                  'Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog']).astype('category')

data['Temperature'] = (data['temp']*(39 + 8)) - 8
data['Feeling Temperature'] = (data['atemp']*(50 + 16)) - 16
data['Humidity'] = data['hum'] * 100
data['Wind Speed'] = data['windspeed'] * 67
data['Casual Users'] = data['casual']
data['Registered Users'] = data['registered']
data['Count'] = data['cnt']

data = data.drop(columns = bikeRent.columns)

sns.distplot(data['Count'])

#Probability distribution of target variabel 'Count' can be seen as nearly normally distributed

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48bda7f0>

Season
Fall      1061129
Spring     471348
Summer     918589
Winter     841613
Name: Count, dtype: int64

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48ba95c0>

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d4dfa3198>

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48bc8208>

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48a40128>

<seaborn.axisgrid.PairGrid at 0x7f8d48a400b8>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f8d4878aac8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d4871a7f0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d4874ca58>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f8d48700cc0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d486b3f28>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d486721d0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f8d48625438>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d485dd668>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d485dd6d8>]],
      dtype=object)

Date                   0
Season                 0
Year                   0
Month                  0
Holiday                0
Weekday                0
Working Day            0
Weather Condition      0
Temperature            0
Feeling Temperature    0
Humidity               0
Wind Speed             0
Casual Users           0
Registered Users       0
Count                  0
dtype: int64

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48392828>

Exploratory data analysis¶

Here the analysis of various categorical feature with respect to target feature is visualised.

Analysis done :-

Probability distribution of target feature.
Bikes hired as per seasons.
Total count for years 2011 and 2012.
Month wise total sum.
Comparison between weekdays and weekends.
Realtion between Temperature, Humidity, Windspeed, Casual and registered users i.e. all continuous features with target feature.
Distribution of all continuous features.

sns.distplot(data['Count'])

#Probability distribution of target variabel 'Count' can be seen as nearly normally distributed

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48bda7f0>

#Separating colums by category
Categorical = ['Date','Season','Year','Month','Holiday','Weekday','Working Day','Weather Condition']
Continuous = ['Temperature','Feeling Temperature','Humidity','Wind Speed','Casual Users','Registered Users','Count']

#distribution of categorical variables with target variable
print(data.groupby('Season')['Count'].sum())

#Number of bikes hired season wise
print()
plt.gcf().set_size_inches(12,8)
sns.barplot(data=data,x='Season',y='Count',hue = 'Weather Condition')

Season
Fall      1061129
Spring     471348
Summer     918589
Winter     841613
Name: Count, dtype: int64

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48ba95c0>

Fall is the season when most of the bikes got hired followed by summer
Most of bikes were hired when the weather was clear or partially cloudy
And no bikes were hired in heavy rain and ice pallets condition.

plt.gcf().set_size_inches(12,6)
sns.barplot(data=data,x='Year',y='Count')

#Year wise most of the bikes were hired in 2012, might be beacause of the popularity of the company

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d4dfa3198>

plt.gcf().set_size_inches(12,6)
sns.barplot(data=data,x='Month',y='Count')

#most of bikes were hired in months of June,July,August,September which are months of summer and fall

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48bc8208>

plt.gcf().set_size_inches(12,6)
sns.barplot(data=data,x='Weekday',y='Count',hue = 'Holiday')

#most of the bikes were hired on 3 day of week

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48a40128>

#Relation of continuous variable with target variable

sns.pairplot(data = data,
             x_vars = ['Temperature','Feeling Temperature','Humidity','Wind Speed','Casual Users','Registered Users'],
             y_vars = ['Count'])

<seaborn.axisgrid.PairGrid at 0x7f8d48a400b8>

With the pair plot we can see the distribution of target variable 'Count' with the other continuous variables.
The target variable is highly dependent on Registered users

#Distribution of continuous variable
data[Continuous].hist(bins = 50,figsize = (15,10))

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f8d4878aac8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d4871a7f0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d4874ca58>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f8d48700cc0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d486b3f28>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d486721d0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f8d48625438>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d485dd668>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f8d485dd6d8>]],
      dtype=object)

From histogram distribution it can be seen that Casual users data is skewed to left because of outliers
Nearly negligible dependency on 'Humidity'
we need to remove 'Humidity' in feature selection afterwords

Missing Value Analysis¶

Here analysis of missing value is done. As per analysis there is no missing value in data

#Missing value analyses
data.isnull().sum()

Date                   0
Season                 0
Year                   0
Month                  0
Holiday                0
Weekday                0
Working Day            0
Weather Condition      0
Temperature            0
Feeling Temperature    0
Humidity               0
Wind Speed             0
Casual Users           0
Registered Users       0
Count                  0
dtype: int64

Outlier Detection¶

Outliers are the values which are far from other observations, they can impact mean and standard deviation with high variablity. So these values should be taken care of either by removing them or replacing them with suitable values like mean, median or mode.

Box plot method is used to detect the outliers in the data.

Certain outliers are detected in Humidity, windspeed and casual users.

#Detection of outliers
var1 = ['Temperature','Feeling Temperature','Humidity','Wind Speed']

plt.gcf().set_size_inches(10,5)
sns.boxplot(data = data[var1])

#from boxplot the outliers can be seen in Humidity and Wind speed

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d48392828>

var2 = ['Casual Users','Registered Users']

plt.gcf().set_size_inches(10,5)
sns.boxplot(data = data[var2])

#here the outliers can be seen in Casual users

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d4806d940>

Outlier Removal¶

The outliers are removed using quarantile method.

#oulier removeal from Humidity
q75, q25 = np.percentile(data['Humidity'], [75 ,25])
iqr = q75 - q25
min = q25 - (iqr*1.5)
max = q75 + (iqr*1.5)

data = data.drop(data[data['Humidity'] < min].index)
data = data.drop(data[data['Humidity'] > max].index)

#outlier removal from windspeed
q75, q25 = np.percentile(data['Wind Speed'], [75 ,25])
iqr = q75 - q25
min = q25 - (iqr*1.5)
max = q75 + (iqr*1.5)

data = data.drop(data[data['Wind Speed'] < min].index)
data = data.drop(data[data['Wind Speed'] > max].index)

#outlier removal from casual users
q75, q25 = np.percentile(data['Casual Users'], [75 ,25])
iqr = q75 - q25
min = q25 - (iqr*1.5)
max = q75 + (iqr*1.5)

data = data.drop(data[data['Casual Users'] < min].index)
data = data.drop(data[data['Casual Users'] > max].index)

Feature Selection¶

Here the coorelation between all the continuous variable including target varibale using heatman distribution is checked.

And from the distribution it is concluded that Temperature and Feelling temperature ane highly coorelated to each other.

Registered users is also highly corelated with the target variable.

Also date and casual users are not required to carry forward for machine learning algorithms.

So dropping four features Date, Feeling temperature, Registered users, casual users.

data[Continuous].corr().style.background_gradient(cmap='coolwarm')

CorrMat = data[Continuous].corr()
plt.gcf().set_size_inches(10,8)
sns.heatmap(CorrMat,annot =True)

<matplotlib.axes._subplots.AxesSubplot at 0x7f8d4f3109e8>

sns.pairplot(data[Continuous])

#From the above plots it can be seen that :
#Temerature and Feeling Temperature are highly correlated
#Registered Users is also highly corelated with target variable count
#So dropping two variables Feeling Temperature,Registered Users from the dataset

#after expermenting to prevent model overfitting Casual users also need to be dropped

<seaborn.axisgrid.PairGrid at 0x7f8d48a49c88>

data.columns

Index(['Date', 'Season', 'Year', 'Month', 'Holiday', 'Weekday', 'Working Day',
       'Weather Condition', 'Temperature', 'Feeling Temperature', 'Humidity',
       'Wind Speed', 'Casual Users', 'Registered Users', 'Count'],
      dtype='object')

data = data.drop(columns=['Date','Feeling Temperature','Registered Users','Casual Users'],axis=1)

data['Season'] = bikeRent['season'].astype('category')
data['Year'] = bikeRent['yr'].astype('category')
data['Month'] = bikeRent['mnth'].astype('category')
data['Holiday'] = bikeRent['holiday'].astype('category')
data['Weekday'] = bikeRent['weekday'].astype('category')
data['Working Day'] = bikeRent['workingday'].astype('category')
data['Weather Condition'] = bikeRent['weathersit'].astype('category')

#data = pd.get_dummies(data)
#Changing categorical to numerical for modelling input

data.head(2)

Sampling¶

Dividing the data set into 80% training data and 20% test data.

from sklearn.model_selection import train_test_split

xTrain,xTest,yTrain,yTest = train_test_split(data.loc[:,data.columns != 'Count'],data['Count'] ,test_size = 0.2)

#Spliting dataset into train and test sets with ratio of 20 percent

xTrain.head(2)

Modelling¶

Here the tools and techniques are used to develop a model to make predictions. Different evaluation matrices are used to evaluate the performance of different models. It helps to reinforce the confidence in the predictions.

These models are validated using Kfold cross validation and scores are compared to obtain the best performing model.

Data is tested with five regression algorithms including:-

Linear Regression
Decision Tree Regressor
Random Forest Regressor
K Neighbours Regressor
Lasso Regression

from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor 
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Lasso

models = []
models.append(('LR ',LinearRegression()))
models.append(('DTR',DecisionTreeRegressor()))
models.append(('RFR',RandomForestRegressor()))
models.append(('KNN',KNeighborsRegressor()))
models.append(('LSO',Lasso()))

#Using 5 regression algorithms to check which algorithm is best for this dataset

from sklearn.model_selection import KFold,cross_val_score

def ModellingAndEvaluationWithCrossValidation(models,scoring):
    for name,model in models:
        kfold = KFold(n_splits=10, random_state=0)
        Scores =  cross_val_score(model,xTrain,yTrain,scoring=scoring, cv=kfold)
        Scores = np.sqrt(-Scores)
        print(name ,' : ' ,Scores.mean())

def ModellingAndEvaluation(models,scoring):
    for name,model in models:
        model.fit(xTrain,yTrain)
        predict = model.predict(xTest)
        print(name ,' : ' ,scoring(yTest,predict))
        
#Created functions to check the model with cross validation and test set

Evaluation¶

Performance matrices used¶

Mean Absolute error : It is the absolute sum of predicted values subtracted by actual values divided by total number of values, which tells how far are the predicted value from the actual values on an average.
Mean Squared Error : It is the sum of square of actual values subtracted by predicted values and dividing the sum by total number of values.
Mean Absolute Percentage Error : It is obtained by absolute sum of actual subtracted by predicted value divided by actual value, it is a measure of percentage loss in prediction.
R square : This is also called as coefficient of determination, it shows how well the model fits on the data. The value of R² ranges from 0 to 1 where value being near to 0 represents poor fitness and near to 1 represents good fitness of the model on the data.

All five models are evaluated using these performance matrices. Mean Absolute Error and Mean Squared Error are evaluated using Kfold cross validation. Then the obtained scored are compared to see the performance of the models.

Mean Absolute Error¶

Mean absolute error scores with Cross validation:-
LR : 24.756516372481055
DTR : 24.42818446609717
RFR : 21.72871879994421
KNN : 31.88993961677325
LSO : 24.74685025048984

Mean absolute error scores with test set:-
LR : 673.0230994298247
DTR : 667.4705882352941
RFR : 515.1220588235293
KNN : 1093.6750000000002
LSO : 671.4968494943123

from sklearn.metrics import mean_absolute_error

print("MAE scores with Cross validation:-")
ModellingAndEvaluationWithCrossValidation(models,'neg_mean_absolute_error')

print()
print("MAE scores with test set:-")
ModellingAndEvaluation(models,mean_absolute_error)

#Mean absolute error score
#Here we can see the error value of Linear Regression is way less than the other algorithms

MAE scores with Cross validation:-
LR   :  24.8229580200882
DTR  :  24.49787624234335
RFR  :  21.216131714704478
KNN  :  32.24282572009112
LSO  :  24.813502042174477

MAE scores with test set:-
LR   :  664.9350135036478
DTR  :  600.2352941176471
RFR  :  402.55823529411765
KNN  :  1053.4676470588236
LSO  :  664.3655581829601

Mean squared error¶

Mean squared error scores with Cross validation:-
LR : 818.0101656656103
DTR : 867.841221570945
RFR : 667.7082018846622
KNN : 1218.8729421997991
LSO : 818.1679241703506

Mean squared error scores with test set:-
LR : 778193.4448160154
DTR : 875236.8014705882
RFR : 557750.5021323529
KNN : 1745766.0673529413
LSO : 775718.2731361163

from sklearn.metrics import mean_squared_error

print("MSE scores with Cross validation:-")
ModellingAndEvaluationWithCrossValidation(models,'neg_mean_squared_error')

print()
print("MSE scores with test set:-")
ModellingAndEvaluation(models,mean_squared_error)

#Mean Squraed error
#Tried and tested with different ratio of train and test set

#From Above scores we can see that Linear Regression algorithm is the best fit for the Bike Rental count prediction

MSE scores with Cross validation:-
LR   :  829.238552644739
DTR  :  871.4714486603973
RFR  :  660.6834636974475
KNN  :  1244.9298330369315
LSO  :  829.2419790976276

MSE scores with test set:-
LR   :  773312.1234234145
DTR  :  718838.2647058824
RFR  :  347890.01208676473
KNN  :  1638351.0999999999
LSO  :  772367.428476732

Mean absolute percentage error¶

MAPE scores:-
LR : 22.059973855505874
DTR : 23.094885769820984
RFR : 20.002969760733535
KNN : 38.408839925256736
LSO : 21.95474677633306

def MAPE(actual,predicted):
    return np.mean((abs(actual-predicted))/actual)*100

print("MAPE scores with test set:-")
ModellingAndEvaluation(models,MAPE)

MAPE scores with test set:-
LR   :  20.48753198527721
DTR  :  18.912773549888918
RFR  :  14.413114481343566
KNN  :  34.57299464487779
LSO  :  20.46678428034319

R Squared Value¶

R squared scores with test set:- LR : 0.830750294699801
DTR : 0.7753703908767058
RFR : 0.9041022858422911
KNN : 0.5952925705228493
LSO : 0.8308547664967059

from sklearn.metrics import r2_score

print("R squared scores with test set:-")
ModellingAndEvaluation(models,r2_score)

R squared scores with test set:-
LR   :  0.751014079902971
DTR  :  0.747899808425404
RFR  :  0.8821635340901736
KNN  :  0.47249455463130496
LSO  :  0.7513182465303729

RandomForest Regressor is the best fit for this dataset¶

MAE = 447.4333823529412
MSE = 392496.15887647064
MAPE= 16.850845290622114
R² = 0.9017552147082286

Form above training and evaluation it is concluded that random forest regressor is the best fit for the data set and above are the given values for the different evaluation metrices used. Here we can see that R² value is quite high which indicated that the model sucessfully captures approx 90 percent of the data.

from sklearn.ensemble import RandomForestRegressor 
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

RFR = RandomForestRegressor()
RFR.fit(xTrain,yTrain)
predict = RFR.predict(xTest)

MAE = mean_absolute_error(yTest,predict)
MSE = mean_squared_error(yTest,predict)
# MAPE = MAPE(yTest,predict)
R2 = r2_score(yTest,predict)

print('MAE = ',MAE)
print('MSE = ',MSE)
print('MAPE= ',MAPE)
print('R2  = ',R2)

#Aplied Random Forest regression to model and calculated MAE,MSE,MAPE and RSquared value

MAE =  413.98161764705884
MSE =  376260.2812823529
MAPE=  <function MAPE at 0x7f8d46c8dbf8>
R2  =  0.8788542045399194

output  = pd.DataFrame({'Actual value':yTest,'Predicted value':predict})

import pickle
pickle.dump(RFR, open('model.pkl','wb'))

from google.colab import files
files.download('model.pkl')

	dteday	season	yr	mnth	holiday	weekday	workingday	weathersit	temp	atemp	hum	windspeed	casual	registered	cnt
instant
1	2011-01-01	1	0	1	0	6	0	2	0.344167	0.363625	0.805833	0.160446	331	654	985
2	2011-01-02	1	0	1	0	0	0	2	0.363478	0.353739	0.696087	0.248539	131	670	801
3	2011-01-03	1	0	1	0	1	1	1	0.196364	0.189405	0.437273	0.248309	120	1229	1349
4	2011-01-04	1	0	1	0	2	1	1	0.200000	0.212122	0.590435	0.160296	108	1454	1562
5	2011-01-05	1	0	1	0	3	1	1	0.226957	0.229270	0.436957	0.186900	82	1518	1600

	season	yr	mnth	holiday	weekday	workingday	weathersit	temp	atemp	hum	windspeed	casual	registered	cnt
count	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000	731.000000
mean	2.496580	0.500684	6.519836	0.028728	2.997264	0.683995	1.395349	0.495385	0.474354	0.627894	0.190486	848.176471	3656.172367	4504.348837
std	1.110807	0.500342	3.451913	0.167155	2.004787	0.465233	0.544894	0.183051	0.162961	0.142429	0.077498	686.622488	1560.256377	1937.211452
min	1.000000	0.000000	1.000000	0.000000	0.000000	0.000000	1.000000	0.059130	0.079070	0.000000	0.022392	2.000000	20.000000	22.000000
25%	2.000000	0.000000	4.000000	0.000000	1.000000	0.000000	1.000000	0.337083	0.337842	0.520000	0.134950	315.500000	2497.000000	3152.000000
50%	3.000000	1.000000	7.000000	0.000000	3.000000	1.000000	1.000000	0.498333	0.486733	0.626667	0.180975	713.000000	3662.000000	4548.000000
75%	3.000000	1.000000	10.000000	0.000000	5.000000	1.000000	2.000000	0.655417	0.608602	0.730209	0.233214	1096.000000	4776.500000	5956.000000
max	4.000000	1.000000	12.000000	1.000000	6.000000	1.000000	3.000000	0.861667	0.840896	0.972500	0.507463	3410.000000	6946.000000	8714.000000

	Temperature	Feeling Temperature	Humidity	Wind Speed	Casual Users	Registered Users	Count
Temperature	1.000000	0.991497	0.122406	-0.138183	0.585568	0.542106	0.625918
Feeling Temperature	0.991497	1.000000	0.135352	-0.165645	0.584664	0.544994	0.628104
Humidity	0.122406	0.135352	1.000000	-0.205273	-0.088887	-0.112971	-0.120973
Wind Speed	-0.138183	-0.165645	-0.205273	1.000000	-0.177481	-0.213087	-0.230985
Casual Users	0.585568	0.584664	-0.088887	-0.177481	1.000000	0.418987	0.640080
Registered Users	0.542106	0.544994	-0.112971	-0.213087	0.418987	1.000000	0.965803
Count	0.625918	0.628104	-0.120973	-0.230985	0.640080	0.965803	1.000000

	Season	Year	Month	Holiday	Weekday	Working Day	Weather Condition	Temperature	Humidity	Wind Speed	Count
instant
1	1	0	1	0	6	0	2	8.175849	80.5833	10.749882	985
2	1	0	1	0	0	0	2	9.083466	69.6087	16.652113	801

	Season	Year	Month	Holiday	Weekday	Working Day	Weather Condition	Temperature	Humidity	Wind Speed
instant
15	1	0	1	0	6	0	2	2.966651	49.875	10.583521
226	3	0	8	0	0	0	2	23.803349	81.750	14.916411